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Abstract. We consider the problem of estimating the density n of a 
determinantal process N from the observation of n independent copies 
of it. We use an aggregation procedure based on robust testing to build 
our estimator. We establish non-asymptotic risk bounds with respect to 
the Hellinger loss and deduce, when n goes to infinity, uniform rates of 
convergence over classes of densities II of interest. 

1. Introduction 

The starting point of this work goes back to 2007 when Persi Diaconis visited 
our Laboratory Jean-Alexandre Dieudonne in Nice. At that time, he explained 
that determinantal processes were emerging in many areas and that there was 
no statistical procedure to estimate their distributions. Almost five years later, it 
still seems to be the case. The aim of this paper is therefore to contribute to the 
study of these processes. Our aim is not only to focus on statistical estimation 
but also to discuss some related problems. For example, how the class V of 
all determinantal densities can be parametrized? Is there an identifiable way of 
doing it? Another natural question, at least for a Statistician, is to understand 
how the elements of T> can be approximated. Are there some specific parametric 
sets that should be used to approximate the densities lying in VI If so, what 
can be said about the approximation properties of these sets? Finally, given n 
independent copies of a determinantal process N, we propose an estimator of 
the density IT of N . We establish non-asymptotic risk bounds for our estimator 
and deduce uniform rates of convergence over classes of II of interest. It turns 
out that our estimation strategy is robust with respect to the assumption that N 
is a determinantal process. This means that the risk bounds we get are not only 
valid when II belongs to the class V but also when II is close enough to it (in 
the Hellinger distance). Our approach is based on T-estimation as introduced 
by Birge (|2006l ). More precisely, we start with a suitable family of models, which 
typically consists of compact sets of densities, and the role of which is to provide 
a good approximation of the elements of V. Then, we discretize these models. 
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This results in a family of points (n m ) me gjt of V and we finally use the data in 
order to select a suitable point among the n m . The way we select this point, 
which provides our estimator of II, is based on robust testing and aims at finding 
an element among the IT m which is as close as possible to the target density II. 
We establish non-asymptotic risk bounds for our estimator and show how they 
depend on the approximation properties of the models we started from. Under a 
posteriori assumptions on II and for a suitable choice of the models, we specify 
this bounds and derive rates of convergence. 

For an in trodu ction to determinantal processes, we refer the interested reader 
to Lyons (]2003l ). Hough et al (|200fj| ) and the book by Anderson et al (j20ld ) as 



well as the references therein. Part of the popularity of determinantal processes 
comes from the fact that they naturally arise in the study of the eigenvalues 
of large random matrices. Recently, Borodin et al ( 20ld ) showed that these 



processes are also involved in the process of "caries" when adding a column of 
numbers. 

The paper is organized as follows. In Section [2] we settle the probabilistic back- 
ground as well as our main notations and conventions. We introduce deter- 
minantal processes in Section [3] and tackle the problem of estimating of their 
densities in Section 01 Finally, Section [5] is devoted to the proofs. 



2. The background 



2.1. Notations and conventions. Throughout this paper we use the conven- 
tions E = and n = 1 and set N* = N \ {0} and R* + = (0, +oo). Given 
a finite set A, \A\ denotes the cardinality of A and for z 6 C, 1 and \z\ 

denote the real part, conjugate and modulus of z respectively. We denote by V 
the class of all finite subset J of N* and set V* = V \ {0}- Moreover, we set 

A = {AG [0,1] n \ |A| 2 = ]Ta 2 <+oc}. 

i>i 

All along, we consider a metric space (X, d) which we endow with its Borel er-field 
B{X) and a cr-finite measure fi. Roughly speaking, a point process on (X,B(X)) 
will correspond to a random choice of a family of distinct points among X. One 
should typically think of X as {1, . . . ,p}, N, R or MP for some positive integer p. 
We denote by H the Hilbert space of measurable and complex-valued functions 
4> on (X,B(X)) satisfying 

U\\ 2 = [ H 2 c^<+oc. 
Jx 

We endow H with the Hermitian inner product defined for <^>, ip G H by 

(cj), t/)) = ^tpdfj,. 
Jx 

For conveniency, we adopt the convention that (., .) is linear with respect to the 
second argument and not the first one, as usually the case. In order to keep our 
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notation as simple as possible, when X is finite, say X = {1, . . . ,p}, we rather 
take for EI the space 

MN*) = {^C N *, ^|^)| 2 <+oo}. 
i>i 

More precisely, a mapping <fi on X = {1, ... ,p} with values in C will be viewed 
as a sequence ((/>(i))i>i G ^(N*) with <f>(i) = for all i > p. By doing so, H 
is always an infinite dimensional Hilbert space (whatever X) and we may define 
U as the set of all orthonormal sequences $ = {4>j)j>i in H. For J £ V* and 
$ 6 U, we set $j = ((f>j)jeJ ar) d given an (ordered) finite subset a of A', denote 
by $ Qj j the |a| x |J|-matrix 

$a,J = (<t>j(x)) xea , je j- 

We extend this notation for rectangle matrices A with entries in C: A a j = 
(Ai j j)i ea j e j. Moreover, A* denotes the transpose of the conjugate of A, that 

is, if A= (Aj)i=i,...,fcj=i,...,fe'. A* = (^ J ,',i)i=i,...,fcj=i,...,fe'- 

Finally, we recall that the Hellinger distance h between two densities p,q on a 
measured space (E,£,u) is defined by the formula 

h 2 {p,q) = \j E {^fp-Vq?dv. 

For the sake of simplicity, we shall keep the same notation h throughout this 
paper even though the measured space (E,£,fi) may be different. 



2.2. The probabilistic background. In this section, our aim is to introduce 
the probabilistic background we shall use throughout this paper. We denote by 
X the class of all finite subsets of X and for k G N, denote by X^ the class of 
those subsets with cardinality k. By convention, Xo = {0}- We identify X with 
the set of finite measures of the form 

a = &x with a G X 

and denote the same way a and a so that for all B G B, a(B) means \a D B\. 
We equip X with the smallest u-field B(K) for which the mappings 

X -»• N 
M B : a ^ a (B) 

are measurable for all B G B(X). In particular, the subsets X& = M^ c 1 {k) are 
measurable for all k G N. We endow (X,H(X)) with the measure L defined for 
all measurable functions / from X into M + by 

f f(a)dL(a) = f(0) + [ ■ ■ ■ > x k}W(xi) . . . dp{x k ). 

Jx fe > 1 • Jx k 

If X is finite, say = {1, . . . ,p}, and if \i is the counting measure on X then 
L is merely the counting measure on X. 
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Throughout this paper, a point process N on (X,B(X)) is a random variable 
defined on a probability space (fi,_4.,P) with values in (K,B(K), L). 



3. Introduction to determinantal processes 

In this section, our aim is to define a determinantal proces s on (X,B(X)). To 
do so, we adopt the point of view developed in Hough et al (2006). In particular, 



we start with the simpler case of determinantal projection processes. 
3.1. Determinantal projection processes. 

Definition 1. Given J G V* and G U, a determinantal projection process N 
of rank \J\ with parameter <3?j = (4>j)jeJ /s a point process with density (with 
respect to L) given by 

(3.1) iTf(a) = |det[$ QiJ ]| 2 l X|J| (a) for all a G X. 
When J = 0, by convention II* = <5 . 

The definition of the density 11* only depends on the functions <f>j for j G J and 
hence, up to a re-labelling of the elements of <I>, we may assume with no loss of 
generality that J is of the form {1, . . . , A;} for some k > 1. At this stage, the 
subscript J is therefore unnecessary but it will turn to be convenient in the next 
section in order to define a determinantal process. 

If the matrix & a ,j = {^ji^xeajeJ depends on an ordering on the set a, 
|det [$ Q J I does not and we shall therefore omit to specify one. It follows from the 
definition of II * that with probability 1, |JV(Af)| = \ J\. Hence, a determinantal 
projection process N of rank \J\ consists of \J\ distinct points of A\ The location 
of these points depends on the geometry of the <pj for j G J. If the <pj are real- 
valued, a configuration a = (x\, . . . ,Xk) is all the more likely that the volume 
of the parallelepiped based on the |J| vectors (((j)j(xi), . . . , 4>j(xk))jej is large. 

The fact that 11* is a density on X might not be clear at first sight. In fact, when 
X = {1, . . . ,p} this comes the the Cauchy-Binet formula: if A, B are k x p and 
p x k matrices respectively with p > k, the Cauchy-Binet formula asserts that 

(3.2) det [AB] = detA {lv .. ife}iQ det£ Qi{lj ... ifc} . 

Again, note that this formula is independent of the choice of an ordering on a. 
By using the Cauchy-Binet formula with B = (4>j(x)) x& x,jeJ = &x,j, A = B* 
and by using the fact the family (4>j)j e j is orthonormal we get 

f U$(a)dL(a) = det[<f Q) j]det [$ Q ,j] = det [$* J>a ] det [$ Q ,j] 



det [$j jX $x,j] = det 



!J j/i,j&J 



1. 
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When X is no longer finite, the Cauchy-Binet formula can be extended by using 
the identity below from which we can deduce in a similar way as above that 11* 
is a density. 

Proposition 1. Let <J> = ((pi, . . . , (p^) and ^ = (ipi, . . . , ip^) be two elements 
ofU k . We have that 

det 



k 



«<^'» M=1 ,..., fc = det det [*«,{!,...,*}] dL(a). 



The proof of the proposition is postponed to Section [5] 



3.2. The general case. As proved in Hough et al ( 20061 ). the distribution of a 



(finite) determinantal process N can be viewed as a mixture of densities of some 
determinantal projection processes. More precisely, a determinantal process can 
be defined as follows. 

Definition 2. Let $ 6 U and A G A. A determinantal process N with parame- 
ters (<I>, A) is a point process with density 

(3.3) n*' A = ^ Pj n J where PJ = II A i II^ 1 " A i) for a11 J eV - 

JeV jeJ jgJ 

We use the convention Ii% = 5 . 

Since Xj 6 [0, 1] for all j > 1 and 

(3.4) £A?<+oo, 

i>i 

the numbers pj are nonnegative and well defined (the infinite product n^U - 
A|) converges for all J € V). Besides, 

Epj =n^ 2 +( i -^))= i 

Jev j>i 

and therefore n A >* is an (at most countable) mixture of densities. Given J € V* , 
it is not difficult to see that for the particular choice A = Xj = (ljej)j>i. the 
density n A ' $ is that of a determinantal projection process with parameter <J>j: 
indeed, for J' = J, p), = 1 and for J' ^ J, p), = 0. 



As explained in Hough et al ( 20061 ). another way of defining a determinantal pro- 



cess is as follows. First simulate a sequence (Zj)j>i of independent Bernoulli 
random variables with respective parameters (Xj)j>\. Consider the subset J of 
those indices j > 1 such that Zj = 1. Finally choose N according to a de- 
terminantal projection process of rank | J| with parameter <£j. With such a de- 
scription, Condition (I3.4[) is easy to understand: together with the Borel-Cantelli 
lemma, it ensures that J is finite almost surely. It is also clear that the distri- 
bution of a determinantal process remains unchanged if we change the labelling 
of the pairs ((Xj, (j)j))j>\. That is, for all bijection a on N*, the parameters 
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((<A?')j>i' C\?')j>i) anc ' ((<^cr(i))i>i) (^a(j))j>i) l eac l to tne same determinantal 
distribution. In particular, with no loss of generality, we may assume that the 
sequence A = (Xj)j>i is non-increasing with respect to j. 

In the literature, one usually associates to a determinantal process N a square 
integrable kernel K on X 2 which defines a self-adjoint compact operator on H 
by the formula 



(3.5) T K 



x i y 



/ K(x,y)c/)(y)diJ,(y) . 



The sequences (A 2 )j>i and <3? = {4>j)j>i mentioned above correspond then 
to the eigenvalues and associated eigenvectors of Tk- Conversely, given the 
sequences A = (Xj)j>i and = (4>j)j>i and provided that [i(X) < +00, the 
kernel K can be obtained by the fomula (Mercer's Theorem) 

(3.6) K(x,y) = J2^j(^j(y) 

where the series converge absolutely for almost every (x,y) G X 2 . When X = 
{1, . . . ,p}, K is merely (any) p x p Hermitian matrix with eigenvalues in [0, 1]. 
Interestingly, the kernel K can be related to the distribution of N by the following 
formula which holds for all measurable functions / from X into R + 



E 



.aCN 



/ f(a)det[K a , a ]dL(a) where K a , a = (K(x, y)) 
Jx 



x£a,y£a ' 



The mapping a >->■ detfi^Q,^] determines the distribution of iV and is called the 
correlation function. When X = {1, . . . ,p}, this formula simply says that for all 

a C X 

F[ac N] =det[K a , a }. 



3.3. Hellinger distance and determinantal process. In the previous section, 
we have seen that the distribution of a determinantal process can be parametrized 
by a pair A) in U x A and that, conversely, any choice of such a pair allows to 
define a determinantal process. The aim of this section is to relate the Hellinger 
distance between the distributions of two determinantal processes associated to 
two distinct pairs (<E>, A) and 7) to some distance between these pairs. Again, 
we start with the simpler case of a determinantal projection process. 



3.3.1. Case of a determinantal projection process. Let $ = (4>j)j>i and ^ = 
(ipj)j>i be two elements of U and J, J' two elements of V. If \ J\ 7^ \J'\, the 
supports of n* and IT* are disjoint (the densities are supported by X|j| and 
X|j/| respectively) and hence /i 2 (lT| ,nf,) = 1. If J = J' = 0, IT| = IT*, and 
therefore /i 2 (rij,]Ij,) = 0. Consequently, the only case we need to consider 
is that where |J| = \ J'\ > 1. In fact, as already mentioned, we may re-index 
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one of the two sequences, say ^, in order to have J = J' without changing the 
distribution IT*. By doing so, the following result holds. 

Proposition 2. Let J £ V* and eD. We have, 

(3.7) fr 2 (n*n|) = 1-1 |det[$ aiJ ]||det[* aiJ ]|dL(a) 

JXu 



< 1 

Moreover, 



det [((<t>i,il>j}) itje j 



^ 2 (n?,n|)<^||^-^ 



-Vlio,- -r.il 2 

if--' 



Up to constants, the last inequality is sharp. For example, if J = {1}, n* n and 
II^i correspond to the two densities on (X,B(X)) given by 

(3.8) nf 1} (x) = |0i(x)| 2 and Uf 1} (x) = \M X )\ 2 for all x £ X 
and hence, if (f>i,ipi are two nonnegative real-valued functions on (X,B(X)), 

Clearly, this equality is no longer true when the nonnegativity assumption on 4>\ 
and ipi is violated. Nevetheless, Proposition [2] says that the inequality remains 
true (up to a constant). The proof of this proposition is postponed to Section 15^21 

3.3.2. The general case. Since n*' A and n*' 7 are mixtures, the problem of 
bounding the Hellinger distance between these two densities amounts to under- 
standing how, more generally, the Hellinger distance behaves with respect to 
mixtures of densities. More precisely, let p,q be two densities on the measured 
space (T, T, m) and (P t )teT and (Qt)teT two families of densities on a mea- 
sured space (E,A,v). What can we say about the Hellinger distance between 
the two mixtures 

P= P t p(t)dm(t) and / Q t q(t)dm(t) 
Jt Jt 

when we known how far p is from q and the P t from the QP- The following 

result gives an answer. 

Proposition 3. Ifm and v are both o-finite, 

h\P,Q) < 2h 2 (p,q) + 2 j^h 2 {P u Q t )q{t)dm{t). 

The proof of this result is postponed to Section 15.31 

We may apply Proposition [3] with the choices T = V (m being the counting 
measure on V), (E, A, v) = (X, B(X),L), p = p x , q = p 1 and for t = J € V, 
Pt = Ilj and Q t = Hj. We obtain the following result the proof of which is 
detailed in Section l5~4l 
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Proposition 4. Let 3>, \t e U and A, 7 G A and set 



A= 1-,/1-A 



and 7 = 1 - « / 1 - 7? 



77?e following inequalities hold 

(3.9) />V,P 7 ) < |A- 7 | 2 + |A- 7 | 2 

(3.10) 



JeP i>i 



/n particular, 

(3.11) /i 2 (n - A ,n* r 'T)<2 



i>i 



\ |2 . I T -|2 

A - 7| + A - 7 



+ 5^7|II^-^H 2 . 

3>l 



4. Statistical estimation 

Throughout this section, we consider a point process N on (X,B(X)) with den- 
sity IT with respect to L. Given n independent copies N\, . . . , N n of N, our 
aim is to estimate IT. One may naturally think of N as being a determinantal 
process which means that IT belongs to the set V of all determinantal distribu- 
tions. Nevertheless, our result is robust with respect to such an assumption in 
the sense that II may not belong to V. In this case, one may rather consider V 
as an approximation set for II. Before turning to the estimation of n, we first 
discuss some identifiability issues. 



4.1. Identifiability and exterior algebra. When N is a determinantal process, 
we may write II = n $ ' A for some pair (<E>, A) G U x A or, alternatively, define II 
from some kernel K on X 2 as in Section I3~6l If these two approaches provide a 
parametrization of T>, none is identifiable. More precisely, two distinct pairs in 
U x A or two distinct kernels may parametrize the same determinantal distribu- 
tion. This lack of identifiability is already true if one restricts to the simpler class 
of determinantal projection processes. A simple counter-example can be obtained 
from d33]) with (X,B(X),fx) = ([0, 1],B([0, 1], dx) by taking ^(x) = e ix and 



In this case, the corresponding kernels K\(x,y) = e 1<yX y ^ and 



K2(x,y) = e 2i ( x ~ y > are distinct but both parametrize the uniform distribution 
on [0,1]. It is also clear from this counter-example that there is no hope to 
estimate <fii, which is not identifiable either. 

Consequently, a question arises. How can we define a one-to-one parametrization 
of D? As we shall see, this problem is rather difficult. For the sake of simplicity, 
we restrict ourself to the case where X = {1, . . . ,p} and focus on the class T> v ^ 
of all determinantal projection distributions of rank k with k 6 {1, . . . ,p — 1} 
and p > 2. It follows from Definition [T]that for each element II G T> p ^, there 
exists an orthonormal family <pi, ...,</>& (which is certainly not unique) such that 
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for all a G X k 

(4.1) n(a) = |det[<D ai{li ... ife} ]| 2 . 

Let us now consider the exterior algebra E = f\ k C p consisting of the sums 
of /c-blades <\)\ A . . . A <f>k with <fii , . . . , <\> k G C p . Denoting by e±, . . . ,e p the 
canonical basis of C p , this exterior algebra E can be viewed as a C-linear space, 
a basis of which being given by the fc-blades of the form 

6 Q — €i 1 A ... A 6^ fc 

where a = . . . (with 1 < i\ < 12 ■ ■ ■ < ik < p) varies among This 
linear space can be equipped with an Hermitian inner product [., .] for which the 
elements (e a ) ae x fe provide an orthonormal family of E. Besides, for a fc-blade 

01 A . . . A fa 

(4.2) [e a , 0i A ... A (f>k] = det [^ Q ,{i,...,fc}] for all a G X fc . 

Let 5^ be the unit sphere of (E, [., .]), G the subset of Se gathering the elements 
of the form <pi A . . . A 4>k for 4>i, . . . , 4>k being an orthonormal family of C p and 
G + be the subset of Se defined by 

G + = {g+ = ^2 \[e a ,g]\ e a I g G G}. 

It follows from (f4~T|) and (j4~2"|) that the mapping 

G+ — > V P)k 

g+ ^ n 9+ : q ^ |[e a ,g]| 2 = |[e a ,fif + ]| 2 

is surjective. It is also clearly one-to-one and provides thus an identifiable 
parametrization of the elements of T* pk by those of G + . In fact, if A denotes 
the Hermitian distance on E defined for g,g' G E by A 2 (g,g') = [g — g' , g — g'], 
A) and (T> Pi h, y/2h) are isometric: by p.7h . for all g+,g' + G G + , 



,2 



A 2 (5+,5V) = ^2 {[g+^a]- [g' + ,e a ]) 
= 2/i 2 (n fl+ ,n fl / + ). 

The metric dimension (in the sense given in Birge (2006)) of a set of densities is 
usually closely related to the minimax rate of estimation over this set. Roughly 
speaking, if the metric dimension of the set is D, one can expect that the minimax 
rate be of order D/n. The above isometry shows that the metric dimension D p ^ 
of (V p k, h) is the same as that of a subset G + of the linear space E for the 
Hermitian distance. In particular D p ^ is not larger than the dimension of E (in 
the usual sense, viewed as a linear space on R), that is D p ^ < 2fT). This upper 
bound is unfortunately very crude and we shall see that the minimax rates can 
be much faster. We believe that the metric dimension of G + is actually of order 
kp even though we have not been able to prove it. 
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4.2. The main result. Let us now turn to the statistical part of this paper. As 
already mentioned, our aim is to estimate the density n of a point process N 
from the observation of n independent copies of it. Our estimation strategy is 
based on T-estimation. More precisely, we start with an at most countable family 
{n m , m G DJl} of determinantal densities, the choice of which will be explained 
below, and we use a test possessing robustness properties in view of selecting the 
closest element to II among the II m . We s hall n ot detail the statistical pro cedure 
here and rather refer the reader to Birge (2006) (Theorem 9) or Baraud (l201ll ) 



(Corollary 5). By applying such a selection rule, we obtain the following result. 

Proposition 5. Let {II m , m G Wl} be an at most countable family of densities 
on (X,£>(X),L) and tt a sub-probability on Wl, that is 

7r(m) < 1 and 7r(m) > for all m G 2Jt. 

There exist universal constant C > and an estimator II solely based on 
Ni,..., N n such that whatever the density II, 

log(l/vr(m))' 



CE 



h 2 (U,U) 



< inf 

mess 



^(n,n m ) + 



n 



Before turning to the choice of the family {II m , m G 9JT}, let us comment on 
the role of tt in our result. When tt is a probability, it can be interpreted as 
a prior on the family {II m , m G 9Jt} and gives thus a bayesian flavor to our 
approach. Intuitively, our procedure tends to advantage densities IT m associated 
to values of 7r(m) which are not too small. 

We design our family {II m , m G Wl} in view of possessing good approximation 
properties with respect to the elements of the class V. Inequality O.lip tells 
us that one can approximate a determinantal density II ' (with respect to the 
hellinger distance) by suitably approximating the sequence A and the functions 
<pj of <I> corresponding to those indices j for which Aj is large enough. To do so, 
we introduce compacts subsets of A and H respectively defined as follows. Con- 
cerning A = (Xj)j>i, with no loss of generality, we may assume the sequence is 
non-increasing with respect to j and it is therefore natural to introduce compact 
sets of the form 

Aj = {7 6 A, 7j , = for all f > j} 

for different values of j > 1. This amounts to approximating A by the truncated 
sequence keeping the j first entries of A, the others being turned to 0. In order to 
approximate the <pj, we introduce an at most countable family % = (ff m ) m£ ^ 
of compact subsets of the unit sphere S of H. Examples of such compacts sets 
will be given in Section l4~3l for the purpose of providing rates of convergence. 
Given a compact subset H of H and rj > 0, we denote by H[rj\ a maximal r\- 
separated subset of H. By applying Propositon [5]to a suitable discretization of 
the compact sets Aj and H m , we deduce the result below. Its proof is detailed 
in Section 15.51 
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Theorem 1. Let H = {H m ) me j^ be an at most countable families of compact 
subsets H rn of the unit sphere S of (H, || ||) and let it be a sub-probability on 
M.. There exists a density estimator U such that whatever the density II on 

(X,B(X),L), 



CE 



/i 2 (n,n) 



< inf 

$eU,AeA 



y i j'>i 



where for all j > 1, 



inf 



inf 

ipeH„ 



rnGM 

and C is a positive universal constant. 



| 2 + -log 

n 



\H m [l/^i}\n 
7r(m) 



Let us now comment on this risk bound. The term h 2 (H, II $ ' A ) corresponds to 
the approximation of II by an element of V. It expresses the fact that our esti- 
mation procedure is robust with respect to the assumption that II belongs to V. 
The quantity Yltj'>j X % ' s tne ' 3 ' as term tnat we § et f° r approximating A by the 
elements of A^.The quantities 0(H, it, 4>j') correspond to the bound we would 
get for estimating the function <pji alone by a model selection procedure among 
T~L (up to possible extra logarithmic factors). The sum X]j'=i O^,^^/) is 
therefore the risk bound we get for estimating the j first elements . . . , <fij of 
= (4>f)j'>i- In order to specify these quantities, let us turn to the following 
typical situation. 

Let {S m ) me M be a family of finite-dimensional subspaces of H with respective 
dimension D m > 1 (viewed as a linear space on M) and for m £ M., let us take 
H m = S n S m . The following results hold. 

Proposition 6. For all n > 1, 

(4.3) log\H m [l/^rl}\<D m log{2^rl+l). 
Besides, for all <p G S 

(4.4) inf - VII < 4 inf ||0-^||. 



The first inequality gives a control of the maximal size of a 1/y/n-se pa rated 
subset of H m . The second one shows that H m and S m share similar approxima- 
tion properties with respect to the elements of S. The proof of the proposition 
is delayed to Section [5T6l With such a result, we deduce from Theorem [1] the 
following corollary. 

Corollary 1. Let S = {S m ) m eM be an at most countable families of finite 
dimensional subspaces of (M, \\ ||) with dimensions D m > 1 and let it be a sub- 
probability on M.. There exists a density estimator II such that whatever the 
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density U on (X,B(X),L), 



CE 



h 2 (n,n) 



< inf 

*eU,AeA 



h 2 (n, n*> A ) + inf Yl °( s > <M + E A ?' 



where for all j > 1, 



0(§,7T 



inf 



inf 



ami C ;'s a positive universal constant. 



vi'=i 



j'>j 



2 D m logn + log(l/7r(m)) 



+ 



For illustration, let us consider the elementary situation where = {1, . . . ,p} 
and assume that II G XL*, is a determinantal projection process on of rank 
as in Section 14.11 In this case, one can choose S = {S} where S is the 
linear subspace of dimension p of H = ^(N*) gathering the elements if the form 
. . . , Up, 0, . . .) with (ui, . . . , Up) G C p and it the Dirac mass on S. Since II 



can be described by an orthonormal family cfti, 
we derive the risk bound 



of S, by using Corollary [T] 



CE 



/i 2 (n,n) 



< 



kp log n 



n 



for all II G 2^p,fc- 



This inequality shows that the minimax rate of estimation over V p ^ is not larger 
than kplogn/n. Since we expect that the metric dimension of T>„ % is of order 
kp, we believe that the logarithmic factor could probably be dropped. 



4.3. Rates of convergence. In this section, we assume that X = [0, l] k for 
some integer k > 1. Our aim is to deduce from Corollary[T]some rates of conver- 
gence towards IT when it is of the form II*' A for some parameter ($, A) 6 UxA. 
To do so, we make some a posteriori smoothness assumptions on $ = (4>j)j>i- 
More precisely, we assume that the cf>j are real-valued and belong to classes 
Bp iP ([0, l] k ) of (possibly) anisotropic real-valued Besov functions indexed by a 
number p G (0, +oo] and a smoothness parameter f3 = {P%)i=x k G (0,+oo) fc . 
When p = +oo, Bbo.ooQO, l] fc ) is merely the class of anisotropic /3-H6lderian 
functions on [0, l] k , which means that a function in 0So,oo([O, l] fc ) is ft-Holderian 
on [0, 1] when we keep all the coordinates fixed expect the i-th. For a more pre- 
cise definition of these smoothness classes we refer to Hochmuth (2002), at least 
when k = 2. The definition there can easily be generalized to larger values of 
k. Denoting by the Besov semi-norm of a function 4> in Bp :P ([0, l] k ), we 

set for any R > 

U^p(R) = {$ = (^);>i G U | 4>j e B£p([0, l] fc ), \<Pj\p, P , P < R, Vj > l} . 



In order to appro ximate the elements of such class, we use the following result 
of Akakpo feOOSl ). 
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Proposition 7. Let p > 0, k G N* and r G N. 77?ere ex/'sts a collection of linear 
spaces (Sm)m£M r ar >d a positive number Ck, r such that for all positive integer 

D, 

(4.5) \{m G M n £> m = D}\ < e c ^ D 
and 

(4.6) inf inf ||0 - V|| < r^H^Z^/* 
for all <j> G B&pflO, l] fc ) anc/ /3 satisfying 

(4.7) sup ft < r + 1 and ~$ = | i V i- | > ft [(p -1 - 2" 1 ) V 0] . 
l<j<fc \ k ~{Pi) 

Hereafter, we consider the family of linear spaces (S m )m£M with «M = U r >o - / ^ ? ' 
and the sub-probability 7r on defined by 



7T(m) 



e -(l+C P)fc )r>m-r for all m G and r G ^_ 



From fj4.5fl . it is easy to check that it is a sub-probability on A4. 

When II = is the density of a determinantal projection process of rank 

jo > 1 with unknown parameter $>n. Mj j \ = (^j)j=i,...,j - By us i n g Corollary [T] 
with the family {S m ) m ^M and the sub-probability it, we deduce the following 
result. Its proof is delayed to Section IBT71 

Proposition 8. There exists an estimator H such that for all jo > 1, R > 0, 

P G (0, +oo) fc and p G (0, +oo] such that]} > k [(p~ l - 2 _1 ) V 0] , we have 



supE 



log re \ 2/3+fc 



where C denotes some positive number depending on k, R,p and f3 only. 

When jo = 1, IlfU is merely a density on (X,B(X)) of the form \4>i\ 2 for some 

function 4>i of unit norm belonging to Bp :P ([0, l] k ). Note that n*^ = \4>i\ 2 also 

belongs to Bp iP ([0, l] k ) and up to the logarithmic factor, the rate we get is the 
usual one for estimating a density in £>p jP ([0, l] fc ). 

Let us now establish rates of convergence towards more general determinantal 
densities. To do so, we also need to make a posteriori assumptions on A. More 
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kf{A) 



or 



AG A | ^2\}<Ar a , Vj>l 

3'>3 

A G A | Y.^' - Ae ~ aj > v i ^ 1 

f>3 



for some A, a > 0. We get the following result the proof of which is delayed to 
Section 15.71 

Proposition 9. There exists an estimator II such that for all A, a, R > 0, 

P G (0, +oo) fc and p G (0, +oo] such that]} > k [(p^ 1 - 2" 1 ) V 0] , we have 



(4.8) sup E 

(*,A)e< p (i?)xA^ a) ( J 4) 



sup E 



/i z (n*'\n) 



< c 



< c 



2gfi 

logn^ (2j+k)(l + a.) 

n 



( logn )2+fc/(2/3)\ 



2j 
20 + ti- 



ll 



) 



where C denotes some positive number depending on k, A, R,p, a and f3 only. 

Obtaining lower bounds for these rates is a difficult task that goes beyond 
the scope of this paper. One of the main difficulty lies in the fact that the 
parametrization of the class V by the pairs ($, A) G U x A is not one-to-one 
and we refer the reader to the discussion of Section 14.11 Consequently, we do 
not know whether these bounds are optimal or not. At least do we believe that 
the logarithmic factor appearing in (]4.8[) is only due to our approach and could 
probably be removed. 



5. Proofs 



5.1. Proof of Proposition [TJ For (x\, . . . ,xj,) G X k and non necessarily dis- 
tinct elements t\, . . . , lj. G {1, . . . , k}, let us set 

/ Mxe,) 
'■. _ 



D^,(X£ 1 ,. ..,X£ k 



and 



/ tpiixe^ ... tpkixiA 

\ 1pl{xi k ) ... 1pk(x£ h ) 
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Let &k be the group of permutations of {1, ... , k} and for a G &k, let e(a) be 
the sign of a. For all r € &k, we have 



det 



«<«;»ij=i,...J = E(- 1 ) £(CT) fl^'^)) 

treSfc i=l 

cree fc ■'■*'* i=i 



cr66fe 



8=1 



c^ fc (a;) 



(5.1) = / det [L>$(x t(1) ,...,x t(a .))] det [^(x r(1) , . . . , x T(fe) )] dfj,® k (x). 

When (^i, . . . , E {1, . . . , /c} fe is not the image of {1, ... , A;} by a permutation 
(that is when the ti are not distinct), note that det [^(x^, . . . ,X£, )] = 0. 
Besides, since (15. ip is true whatever r, we have 



det 



i> ^j))i,i=l,...,& 



7J E / fc det [^(^(l),---,^))] det [^(^r(i)>---.a;r(Jfe))] dn® k {x) 
- E / det[ J D < j,(x £l ,...,x 4 )]det[^(x £l ,...,x 4 )] ( i/i 0fc (x) 



,...,4)e{i,...,fc}fe' 



1 

k\ j xl . 



(4,...,4)e{i,...,fc} fe 



E(-i) £(a) n^(^)^)(^) 



j=i 



dn® k {x) 



k / k 



' (Te6 fc j=i V=i / 



{l,...,fc},{:ri,...,:r fc } 



det [% 1 ,.., !r) ,},{i 1 ..., A }] dfj® fc (s) 



det 



1 { fc},Q 



det [^^{i,...^}] dL(a). 
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5.2. Proof of Proposition [2j The first equality is clear since by O.ip the 
Hellinger affinity between n* and Ilj equals 

p(irj,n|) = f Jn*(a)n*(a)dL(a) 

|det [$ Q)J ]| |det [* a ,j]|dL(a). 
For the second part we use Proposition [T] and get 

1- f |det[$ Q ,j]||det[* a ,j]|dL(a) < 1- f det det [* a>J ] dL(a 

JXi. JXi. 



1 



det 



Let us now prove the last inequality and set a = 2/5. First note that if 
J2jeJ W&3 ~ V'j l| 2 > a then the result is true since the Hellinger distance is 
bounded by 1. We may therefore assume that 

(5.2) ^2Hi-^\\ 2 <a. 

In the remaining part of the proof we consider the linear space M. j x j(C) of 
| J | x | J | matrices indexed by J with entries in C. We endow A4j x j(C) with 
the Hilbert-Schmidt norm defined by 



1/2 

2 



It is well-known that this norm is sub-multiplicative in the sense that for all 
A,B G M JxJ (C), \\AB\\ < \\A\\ \\B\\ and it also satisfies 

(5.3) |Ti(AB)|<||A||||fl||. 

One can decompose the matrix A = ((4>i,ipj))i,j<=j as A = D + B where D is 
diagonal with entries = {<f>i,ipi) and B = A — D. Since \\<f>i\\ = \\ipi\\ = 1 
for all i, under (j5.2p . 

(5.4) |A,tl > ft(A,») = i - — "^ l|2 > i - I > Vi = l,...,fc 

and hence, D is non-singular. We may therefore write 

det A = det D det (J + M) with M = 
and since for all i, EjeJ l(^>^)| 2 < ll^f = L 

II^H 2 = E E K«^i>l a * E ^TtR^ = *S 

. gJ I l,l I igJ I Ml 
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\\m 2 <± 2 j<7^&*T, 



a _,,/ 2 ,^"v;-rf< TE <1 - 



The matrix I + M is therefore non-singular and we may write I + M = e L for 
some matrix L S M.j x j(C). In fact, 



M 



L = M — —- — h ^(— l) p_1 



p>3 



MP 
P 



where the series converge normally in (Mj x j(C), || ||). Moreover, 

det(J + M) = e^ L \ 

Since the mapping L i-> Tr(L) is linear and continuous on (_Mj x j(C), || ||) and 
since Tr(M) = 0, 

^ (i) = _IHf) +B _ ir iZW 



p>3 



P 



By using f)5.3[) and the sub-multiplicative property of the Hilbert-Schmidt norm, 
we get 

Tr(M 2 ) 



Tr(L) + 



p>3 



and thus, by using that ||M|| < Aj, 



»(Tc(L)) > -SR 



/Tr(M 2 
V 2 

2 



3(1-||M||) 

l|M|| 3 
3(1-||M||) 



> 



> 



i + 



3(1-||M||); 
2Aj 



3(1 -A, 



This inequality together with the fact that logu > —(1 — for all u > 
leads to 



|det A| = |det£>|e 



5R(Tt(L)) 



exp 



> exp 



i^log(I^P) 



2 2 
exp [-c(Aj)A 2 j 



i_ 2 

2 

2Aj 



3(1 -A/ 



3(1 -Aj) 
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with c(u) = 1 + u/[3(l — u)\ for u > 0. By using (15. 5D and the fact that c is 
increasing, we get 



fc 2 (n*,nf) < i-|detA| 



< 1 — exp 

c 



, a(1 " a L A l \ A 2 



(l-o/2)V"'' 
a(l-a/4)\ (l-a/4) 



(l-a/2)V (l-a/2) 2 ^/ ' ' 



E 



the last inequality being true with our choice of a. 



5.3. Proof of Proposition [3J Let us set 



R = / P t q(t)dm{t). 
Jt 



Since h 2 (P,Q) < 2h 2 (P,R) + 2h 2 (R,Q), it remains to bound each of those 
terms from above. The measures v and m being cr-finite, we may apply Fubini- 
Tonnelli theorem. By using the Cauchy-Schwarz inequality, we bound the first 
term as follows. 



2h 2 (P,R) 



< 



< 



< 



P-VR) dv 



(P - R) 2 

E (Vp + Vr) 2 



dv 



{J T P t (p(t)-q(t))dm(t)y 
P + R 



dv 



It VPtivW) - Vq(t))VPt(Vp{t) + VW))dm(t) 
P + R 



-dv 




< 2 



P + R 
dm(t) = 4h 2 (p,q). 
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Let us now turn to the second term. By using similar arguments, 

v2 



2h 2 (R,Q) < 



dv 



{J T (P t - Qt)q(t)dm (t)Y 

R + Q 

J T {VPt-V&)y^x(VPt + V&)Vq(t)dm(t) 
R + Q 



-dv 



< 




'Pt ~ VQt) q(t)dm(t) x ^ j^-q 



< 4 j^h 2 (P t ,Q t )q(t)dm(t). 
We conclude by adding these two upper bounds. 

5.4. Proof of Proposition^ Inequality (|3TT]) derives from (T3~9l) . ([37101) and 
Proposition [3l Hence, it remains to prove fl3.9|) and p.lOp . 

Let us prove (j3.9j) . To do so, we set a -1 = e/[2(e — 1)] < 1 and prove the 
stronger inequality 

/i 2 (p A ,P 7 ) < a" 1 [|A- 7 | 2 + |A- 7 | 

If there exists j > 1 such that \Xj — 7j| + \Xj — 7j| > a then the result is 
clear since h 2 {p x ,p 1 ) < 1. Otherwise, 

h\ P x ,p r ) = i ^IIV.IIV, i H(V, -V,) 

JevjeJ jgj j>i 

= !-n r 1 

and by using that log(l — u) > — [2a~ 1 log(l — a/2)]u for all u G [0, a/2], we 
get 



1 — exp 



< 



with our choice of 



J>i 
log(l - a/2) 



o % 1 \ / 



l r 

a 



i>i 

|A- 7 | 2 + |A-7| 



E [l A i -7j| 2 + |Aj — Til 



Let us now prove O.lOp . By using Proposition [2] we have that for all J £ V, 

J^0 



(5.6) 



^(n!,n!)<^ 



,J 3 TJ 
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With the convention ^ = 0, this inequality remains true when J = since in 
this case ITf = nf = 6 and thus h 2 (U^ ,nf ) = 0. We may therefore write 

J» 2 (nf,n!) < ^E^Ell^-^ll 2 

Jev JeP jeJ 



< 



^Eiifc-irf E pi 

j>i JeV,jeJ 

5 



< ^wb-w e n tjiic 1 -* 

i>i JeV,jeJj'eJ,j'^j j'fLJ 

= ^EtHI^-II 2 II (^ + (i-t|0) 

J>1 3'>h5'H 



J>1 



2 



as claimed. 



5.5. Proof of Theorem [TJ The proof is based on suitable choices of (H m ) m em 
and it involved in Proposition [5j Let be some arbitrary element of H with 
unit norm. The function will play no r °l e i n our proof and is just convenient 
for our definition of the family (n m ) mg gj{. For j > 1, we set 

Aj[l/n] = {7 G A| jj/ G {i/n, i = 1, . . . n} if f < j, jj> = otherwise} 

and 

Tl=[j J I [] ff roj , [1/Vn| n^o} x A^l/n] 

3>1 mi6jM,...,mj6jVf V?'=l j'>j 
An element m G 9K is therefore of the form tn = (^I // ,7 / ) with 7' = (tJ-/) 3 -'>i 
in Aj[l/n] C A and ty' = (^j')i'>l ' n tne ^' rst components being in 
maximal l/y^n-se pa rated subsets of the H m . To each m = (^ // ,7 / ) G Wl, we 
can associate the density n m = II* ' 7 by Q3.3P and by doing so, define our the 
family (n m ) mg OT- Let us now define A. We take for m = m(j, mi, ... , rrij) G Wl 



e 



-A(m) = 1 -Q ^("V) 



which satisfies 

E^ H = E 2 ~ J E E E *K) 

me!W j>l XeAj[l/n] mieM mj£M 

x E 1 n /. An 1 " ' E 



Ifl^Jl/VnH"" „ |fT mj .[l/v^]| 



< 1. 
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Let us now fix (<3?, A) G Ux A with A non-increasing. For any choice of j > 1 and 
mi, . . . ,rrij G .M, let (^,7) G U x A defined as follows. For all f G {1, . . . 
7?' = -V anc ' ^i' ' s some best approximation of </!y by an element of H mjl . For 
j' > i> 7j' = and = ■00- By using p. IIP , we have 



/i 2 (n*>\n*' 7 ) < 2 IA-7P + IA-7 



(5.7) 



< 



< 



2 E 

j'>3 



A 2 + 1-Jl-A 2 



+ 5 E Iki' 



Aj + 



1 + ,/1-A; 



i'=i 



+ 5 E 11^'' ~^J' 



< 4 E A f+ 5 EII^-^ 

j'>j i'=i 



The family 9JI provides an approximation of (^,7) in the sense that there exists 
m = (#',7') G Wl such that 



I 'I 2 , I- 7I 2 

7-7 +7 — 7 



j'=l 

3 



e ( Ki'-7>r+ 



(5.8) 



i'=i 



and 



(5.9) 



E 7 |, - 4>f\\ < E 11^-' - ^-'11 < - 

i'>i /=i 



By putting these bounds together, we obtain that for all j > 1 and mi, . . . , rrij G 
.M, there exists m G 9JT 



h 2 (n*>\n m ) < 2/i 2 (n*'\n*") + 2/> 2 (n^,n m ) 

< 8^ A 2 , + 1OEH0J' -^'l| 2 + 



2 , l_8j 

n 
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Consequently, for some universal constant C > 



C' inf 

mean 



^(n*' A ,n m ) + 



A(m) 



11 



< inf inf 

j>l mi,...,mjEM 



j'>3 



£ 

i'=i 



2 1 

l^j' ~ Vy|| + - ( lo § 



+ log 



7r(mj 
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5.6. Proof of Proposition El For all m E .M, i? m [l/\Ai] is a l/y/n- separated 
subset of the unit ball of a finite-dimensional linear space S m on R of dimension 
D m . Consequently, for a\\ m £ M. 

log|fl- m [l/Vn]| <L> m log(2^+l) 

(see for example Lemma 4 in Birge ( 20061 )). Leta<l. If mf^ e s m \\cj) — t/}\\ > a, 
then by the triangular inequality, for all ip' in H m 

U -/II < 2 < - inf ||0- VII ■ 

Otherwise, there exists ?/> G 5 m such that \\<f> — ^|| < a. Hence, \\ip\\ > \\4>\\ — 
||<A — "011 > 1 — a > 0. In particular, ip ^ and we may set tp' = ip/ \\ip\\ E -ff m . 
We have 



||0-V>'| 







1 <p 


* _ IW 







+ 



4> ip 



< 



< 



+ 



< 



We get the result by choosing a = 1/2. 

5.7. Proofs of Propositions [5] and By using the collections of linear spaces 
§ and our choice of ir, and by using some classical optimization with respect to 
m E M., we get that for all j > 1 

0(S,tt,^) < C7(logn/n) 2 ^ /(2 ^ +fc) 

where C is a positive constant depending on R, k, f3 and p. Up to the logarithmic 
factor, this bounds correspond to the usual estimation rate over <Bp,p([0, l] fc ). 
When A = (tj<j )j>i, n*' A = II* jo} and Corollary [1] leads to Proposition [SJ 

For all A E AL a ^(A), we get from Corollary Q] that 



C7'E 



/i 2 (n,n) 



< inf 



j(logn/n) 2 ^+ fe )+Af 
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The minimum is achieved for j of order (n/ log n) 2 ^/^ 2/3+fc ^ 1+a ^ , which leads 
to the rate (logn/n) 2Q/3 /[( 2/3+fe )( 1+a )l as claimed. The other rate is obtained by 
arguing similarly and by choosing j of order 



a(2(3 + k) 



log n. 
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